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LEXINGTON 


MASSACHUSETTS 


ABSTRACT 


A  method  of  density  estimation  is  proposed,  which  is  a 
rational  modification  of  orthogonal  expansions,  combined  with 
a  stopping  rule  determined  by  a  nearest  neighbor  statistic. 

This  method  yields  consistent  estimates  and  applies  (in  principle) 
to  density  estimation  in  any  number  of  dimensions. 


iii 


ON  NONPARAMETRIC  PROBABILITY  DENSITY 
ESTIMATION  USING  ORTHOGONAL  SERIES 


I.  INTRODUCTION 

Among  the  numerous  non-parametric  methods  of  estimating  a 
probability  density  function,  the  approximation  of  this  density 
by  a  finite  fourier  series  has  several  computational  advantages. 
Probably  the  most  important  of  these  is  the  fact  that  the  eva¬ 
luation  of  this  density  at  a  new  data  point  requires  only  the 
storage  of  certain  fourier  coefficients.  One  of  the  main  dis¬ 
advantages  of  such  an  approximation  of  a  density  is  the  difficulty 
of  determining  the  number  of  terms  in  the  expansion. 

In  this  note,  we  propose  an  approximation  which  is  a  rational 
function  of  a  finite  fourier  series.  The  number  of  terms  in 
this  series  depends,  in  a  very  natural  way,  on  the  nearest  neigh¬ 
bor  error  rate  for  the  sample  data  when  compared  to  a  sample 
drawn  from  a  reference  distribution.  In  III  we  show  that  the 
method  is  consistent  and  in  IV  we  remark  on  the  relevance  of 
this  method  in  hypothesis  testing. 
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II. 


SECOND  ORDER  SOLUTION  TO  THE  BINARY  DECISION  PROBLEM 


Let  p^,  p2  be  two  Lebesgue  measurable,  bounded  ( S K)  prob¬ 
ability  density  functions  on  the  unit  cube,  I,  in  Rn.  We  assume 
further  that  p^*p2  on  some  set  of  positive  measure  in  I  and 
that  for  some  6>0  ,  p^><S  on  I.  Let  #£*  =  {f  e  L2(I):  E^  f 
=J"  f  p^dx=0,  E2  f  =J“  f  p2dx=l}.  According  to[lJ,  a  second 
order  solution  for  an  optimal  discriminant  T  e  j£,  for  the  binary 
hypothesis  test  H^:  X  has  density  p^  vs.H2:  X  has  density  p2» 
is  a  critical  point  for  some  real  ot  of  the  functional 


Ja(f)  =  aVAR1  f  +  (l-a)VAR2  f 


(1) 


In  fact  if  we  restrict  ourselves  to  the  case  0<a<l  and  solve 
(1)  for  the  unique  (to  within  a  null  function)  critical  point 
(and  global  minimum  of  J^ff)  for  f  e*>.  we  obtain  by  elementary 


variational  calculus 

[(l-oO-xJpj/Pj  ♦  X 

a  +  (1-a)  P2/P1 


f  = 


(2) 


with 


(!-«)/ 


P2Pi 


0  >x 


ap^+ (1-a) p2 


dx 


(P2"pl)pl 


f—ilzlL 
J  ap^+ (1-a 


7?: 


dx 


=  (1-a)  -  Ja(f) 


It  follows  that  f  is  rational  and  increasing  in  (p2/p^) 
and  hence  optimal  (by  an  adjustment  of  threshold)  for  minimum 
total  error  (or  Neyman-Pearson  at  level  6)  hypothesis  testing. 
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III. 


DENSITY  ESTIMATION 


For  simplicity  we  consider  only  the  case  a=Jj.  Similar 
results  may  be  obtained  for  other  a.  Solving  (2)  for  P2/P^  we 
obtain 


£2  _  >jf-l  _  f~1'f2jis(f) 
Px  (h-D-Hf  2J^(f)-f 


(3) 


We  now  write 

vf) 


>5  + 


nn 


4  (*s-e  ) 

nn 


(4) 


f  ^2^1 

where  e  =  I  — - — dx  is  known  as  the  limiting  nearest  neighbor 
nn  J  P-|+P2  y  y 

error  rate,  i.e.,if  we  generate  n  independent  class  1  samples 
from  a  distribution  with  density  p1  and  similarly  n  class  2 
samples  from  a  distribution  with  density  p2  and  then  classify 
new  samples  (drawn  from  class  1  or  class  2  with  equal  probability) 
as  the  class  of  the  (a)  nearest  neighbor  in  the  original  2n, 
then  the  classification  error  of  this  procedure  approaches  enn 
as  n-*oo  with  probability  1.  (See  [2].) 

We  now  make  a  final  assumption  that  p^sl  on  I.  Again, 
results  analagous  to  the  following  will  still  hold  provided  p^ 
is  strictly  bounded  away  from  0  in  I. 

Suppose  we  are  given  n  independent  samples  from  a  distribu¬ 
tion  with  density  p2.  Let  1=?^  ,  ^2  , . . .  be  a  complete  orthonormal 
system  for  L2(I).  Finally,  let  be  the  empirical  density 
determined  by  the  n  sample  points.  Now,  consider  the  solution 


of  the  variational  problem:  minimize 

h  VAR.  f  +  h  VAR,  f  =  <*> 

1  V  N 

n 

such  that 

N 

f  =£ai*i 
i 

Ex  f  =  0 

Ev  f  -  1 
n 

where  N  is  determined  by  a  "stopping  rule".  We  then  let  f  be 
the  above  minimizing  f.  Before  describing  the  determination  of 
N,  we  show  that  the  preceding  variational  problem  has  solutions 
with  probability  1  for  large  enough  N. 

Lemma  Assume  n  is  fixed.  Then  with  probability  one  (*)  has 
solutions  for  large  enough  N.  In  fact  min  JNn(f)~*  0  as  N-»« 
with  probability  one. 

Proof ;  Let  L  be  any  positive  integer  and  e»0.  Then  there  is 
an  Nq  such  that,  for  N>Nq,  there  are  functions  ^2 ' ' ' ’ ^L 
e  <  ^2.'  with  the  properties: 

(i)  \\'fi\\l<  £  +  e 

(ii)  there  exist  disjoint  subsets  A^,...A2  with 

L 

m(UA.)>l-e  s.t.  x  e  A.  implies 
1  1  1 

|f^(x)-lj<e  and  |fj(x)|<e  (j#i)  . 
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Hence,  with  probability  (wrt  P2 )  >(1-Ke)n  ,  each  of  our  samples 
SL  will  lie  in  some  A.  .  Let  us  now  consider  the  function 


f 


n 

E  v 

£=1 


1  -if 


l  1=1 


.  n 


n 


k=l 


E  \  <*k> 

«,=!  * 


dx 

£ 


Clearly  E.  f=0,  E  f=l.  We  have  further 
1 


VAR.  f  = 


Hence,  JNn(?)  becomes  arbitrarily  small  as  N-»oo  with  probability 
arbitrarily  close  to  one. 

The  solutions  of  <*)  can  be  easily  obtained  by  the  method 
of  Lagrange  multipliers.  Since  ^1,(*)  is  reduced  to  solving 
the  following  for  a^  - 


min 


N 

%  E 
2 


a.2  + 

l 


N 

y  a.  a  .  9.  . 
.  4-,  1  D  ID 
i,D=2 


such  that 
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1 


For  the  determination  of  N=N  ,  we  first  estimate  J,  (f)  by 

n  *5 


j  =  +  - - - 

n  4  (*s-en) 


where  e11  is  the  expected  nearest  neighbor  error  rate  of  the  n 
samples  with  the  leaving-one-out  method: 


•"  -  i  £  [l-d-V,)"-1] 


(6) 


where  V  ^  is  the  volume  of  the  intersection  of  I  with  a  sphere 
centered  at  and  of  radius  equal  to  the  distance  between  x^ 

and  its  nearest  neighbor  in  (x^^^  •  Now  e1-*  enn  with  probabi¬ 
lity  one  as  n-*oo  and  hence  J  -*  J,  (f)  with  probability  one  as 

n 

n— oo  .  Let  N  be  an  N  which  minimizes  |j„n(f  )  -  J  I.  By  the 
lemma  such  an  N  exists  with  probability  one  provided  that 
Jn>0  and  this  is  true  with  probability  one  as  n-»oo  . 


6 
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The  estimate  we  then  use  for  P2  is 

f  -1+2  J 
n  n 

pn  = 

n  n 


(7) 


If  we  should  know  the  value  of  K,  we  may  use  the  truncated  esti¬ 
mate 

pn  =  (pnV0)  AK  (8) 

We  now  make  the  following  consistency  claim. 

Theorem  pn~.  P2  in  Lebesgue  measure  with  probability  one  and 
.  L2 

pn - ^p2  with  probability  one. 

Proof  From  the  form  of  (3) ,  (7) ,  (8)  and  the  fact  that 
Jn-*  J^dT),  it  suffices  to  show  that  f^ — ^f  with  probability 
one. 

Note  that  ?2,  9^,...  are  linearly  independent  and  dense 

in  fdx=0)n  L2  ( *s+-7p)  where  I^C^+^p)  denotes  the  set  of  square 

integrable  functions  wrt.  to  a  measure  whose  density  is  !s+p2/2. 

Now  form  a  complete  orthonormal  basis  £3,...  of 

{f  if  fdx=0}n  L2  ( *5+-^)  where  each  is  a  linear  combination  of 
J  -  _  00 

,  f ,  , . . .  <?.  .  Let  c .  =  /  £  .  p,.  Then  f  =  £  b .  £  .  where  b .  is 

00  00 

the  solution  of  min  Y\  b.  such  that  5Z  c.  b.  = 

o  1  V  1  1 


1.  This  is 


Similarly,  we  form  a  complete  orthonormal  basis  Hj0*  n 3°* 

of  with  each  run  a  linear  combination  of 

V.  .  Let  d.n  =  f  n  .  n  v  .  The  solution  of  (*)  is 
2  3  1  1  J  1  n 

Nn  Nn 

given  by  ffi  =  ^  (din)  2  . 


Clearly,  d^n-»  c^  and  Jj  h^n-£;^  ||  0  with  probability  one 


as  n-*oo  for  each  i.  Since 


1(1  -  (i  (di"Th° 


with  probability  one  as  n-«oo  ,  for  each  N,  it  follows  that 


%  °0 

with  probability  one  as  n-*oo,  and  hence  zl  (d.n)2-*  c .  2 

2  1  2  1 

with  probability  one  as  n-*oo  .  Finally,  for  any  e>0  pick  M 
00  2 

such  that  53  c.  <e.  Then 
M  1 

Nn 

lim  II  fn~f  II  2  -  II  13  d.n  n  n  ||  +  ||  13  c.  C.IL 

with  probability  one.  But 


g  II  2  *  J  2  II  g II  and  ||  g  ||2  <  /T  ||  g  ||  v 


H1 
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